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Abstract 

This work extends the Integrated Nested Laplace Approximation (INLA) method to 
latent models outside the scope of latent Gaussian models, where independent components 
of the latent field can have a near-Gaussian distribution. Two important class of models that 
can be addressed with our proposed method are non-Gaussian random effects models and 
dynamic models with non-Gaussian error term for the observation and/or system equation. 
Our approach is applied to two examples and the results are compared with that obtained by 
Markov Chain Monte Carlo (MCMC), showing similar accuracy with only a small fraction of 
computational time. Implementation of the proposed extension is available in the R package 
INLA. 

Keywords: Approximate Bayesian inference, INLA, MCMC, near-Gaussian latent models 



1 Introduction 



Integrated Nested Laplace Approximation (INLA) is an approach proposed bv lRue et al.l (|2009l ) 



to perform approximate fully Bayesian inference on the class of latent Gaussian models. It 



was demonstrated in the original pa per that, when compared with the more usual 



Markov 



Chain Monte Carlo (MCMC) schemes (jRobert and Casellal . 12004 ; iGamerman and Loped . 120061 ) . 



INLA outperforms the latter both in terms of accuracy and speed. Monte Carlo averages are 
characterized by additive O p (N~ 1 / 2 ) errors, where N is the simulated sample size, meaning that 
we need 100 times more computational time to improve our estimates by one digit. Besides that, 
due to the additive nature of Monte Carlo estimates, it is even harder to accurately estimate tail 
probabilities with MCMC. On the other hand, INLA bypass the need for stochastic simulation 
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by an extensive use of simple and fast Gaussian approximations in a clever way to take advantage 
of the properties of latent Gaussian models, where for most real problems and data sets, the 
conditional posterior of the latent field is typically well behaved, being close to a Gaussian. As 
opposed to MCMC, INLA has relative error which allow for more accurate estimates of small 
quantities, as for example the estimation of tail probabilities. 

INLA is not meant to be a replacement of MCMC in applied statistics, but it is a specific 
tailored algorithm that works extremely well in the broad class of latent Gaussian models, and 
thus offer a better option in this context. This paper proposes an extension that allows INLA 
to be applied to models where some independent components of the latent field have a near- 
Gaussian distribution (see Section 13. ip . Interest for such models arise often in the literature 
but the lack of user friendly software able to handle them in a fast and accurate way might 
lead someone to stay with the more standard models, even though it might not be the best 
for their application s. Examples of such models are the survival models with gamma frailty 



(Ibrahim et al 



2001 



) where the ran dom effects has a log gamma distribution, more robust mixed 
effect models (jPinheiro et all boQll ) , where the distribution of the random effects is assumed to 



be Student's t rather than Gaussian, th us allowing for o utlier identification and accommodation 



and non-Gaussian state space models (jKitagawal . 119871 ) where the distribution of the noise in 



the state space evolution equation is assumed to be non-Gaussian. The list is, of course, much 
larger than that, and a reliable method to fit this large class of models efficiently will provide 
the right tools for the applied scientist to practice a more flexible and realistic data analysis. 

This extension is not straightforward given the central role that the latent Gaussian field 
plays in the INLA methodology In this paper, we take advantage of the types of approximations 
performed and the way they are combined in INLA to propose a new way to look at this latent 
(near-Gaussian) models of interest and show how to adapt INLA to fit this more complex class 
of models. The paper is organized as following: Section [2] will describe the latent Gaussian 
models and the INLA methodology, highlighting the importance of the Gaussian latent field to 
the success of the method. Section [3] define the sub-class of latent models of interest in this paper 
and describe our proposed extension to fit this set of models. Section [5] illustrate our approach 
with two examples and Section [5] concludes with some remarks and point to future research 
involving the topics discussed in this paper. The proposed extension is already implemented as 
part of the R package INLA, and its use is illustrated in Appendix |A^ where the R code from the 
example in Section B~T1 is displayed. 



Please visit http://www.r-inla.org/ for more information about the R package INLA. 
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2 Integrated Nested Laplace Approximation 



This section contains a brief de s cripti on of latent Gaussian models and a review of the INLA 
method proposed by 



Rue et al 



(|2009l ). Latent Gaussian models has a wide range of applica- 
tions and includes, for example, regression models, dynamic models, spatial and spatiotemporal 
models. In Section [2.11 we define the class of latent Gaussian models and its hierarchical repre- 
sentation that will make the exposition of the approximation methods described in this paper 
easier to read. The Gaussian approximation to conditional distributions of the latent Gaussian 
field, which is the core of INLA is described in Section T2.21 The INLA method applied to latent 
Gaussian models is described in Section T2.31 while the importance of the Gaussian prior on the 



latent field to the success of INLA is made explicit is Section [2? 



2.1 Latent Gaussian models 

The INLA framework was designed to deal with latent Gaussian models where the observation 
(or response) variable yi is assumed to belong to a distribution family (not necessarily part of 
the exponential family) where the mean //j is linked to a structured additive predictor rji through 
a link function <?(•), so that g(ni) = The structured additive predictor rji accounts for effects 
of various covariates in an additive way: 

n f rip 

r]i = a + ^2f (l \u j , i ) + ^2/3 k z ki + e i: (1) 

j=l k=l 

where {f {j) {-)Ys are unknown functions of the covariates u, used for example to relax linear 
relationship of covariates and to model temporal and/or spatial dependence, the {/3fc}'s represent 
the linear effect of covariates z and the {e^j's are unstructured terms. Then a Gaussian prior is 
assigned to a, {/ (i) (-)}> {fit} and {ej. 

We can also write the model described above using a hierarchical structure, where the first 
stage is formed by the likelihood function with conditional independence properties given the 
latent field x = (rj,a, f,f3) and possible hyperparameters 9±, where each data point {yi,i = 
1, ...,rid} is connected to one element in the latent field X{. Assuming that the elements of the 
latent field connected to the data points are positioned on the first elements of x, we have 

Stage 1. y\x,6i ^ Tr(y\x,ei) = l\^ 1 7r(y i \x i ,e 1 ). 

The conditional distribution of the latent field x given some possible hyperparameters 9 2 forms 
the second stage of the model and has a joint Gaussian distribution, 

Stage 2. x\0 2 ~ <k(x\0 2 ) = M{x; 0, Q" 1 ^)), 
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where jV(-; /x, Q^ 1 ) denotes a multivariate Gaussian distribution with mean vector [i and a pre- 
cision matrix Q. In most applications, the latent Gaussian field have conditional independence 
properties, which translates into a sparse precision matrix Q(#2)> which is of extreme impor- 
tance for the numerical algorithms that will follow. The latent field x may have additional linear 
constraints of the form Ax = e for an k x n matrix A of rank k, where k is the number of 
constraints and n the size of the latent field. The hierarchical model is then completed with an 
appropriate prior distribution for the hyperparameters of the model 6 = 62) 

Stage 3. 6 ~ tt(0). 

2.2 The Gaussian approximation 

The Gaussian approximation to densities of the form 



plays a important role in INLA, where gi{xi) is a function of Xi that may depend on yj and 
0, and / is an index set. Hence in this section we describe one of the many possible ways to 
approximate ([2]) by a Gaussian distribution. 

We can perform a Taylor expansion up to second order of gi(xi) around an initial guess fj,^ 



where b% and Cj depend on ii\ , and then a Gaussian approximation is obtained with precision 
matrix Q + diag(c) and mode given by the solution of {Q + diag(c)}/i^ 1 ^ = b, where b and c 
are vectors formed by b^s and c^s respectively. This process is repeated until it converges to 
a Gaussian distribution with, say, mean x* and precision matrix Q* = Q + diag(c*), where 
c* = c(/x*), which we denote hereafter by itg(x\6, y). 

2.3 The INLA method 

For the hierarchical model described in Section 12.11 the joint posterior distribution of the un- 
knowns then reads 




9i(xi) ~ ') + hxi - - 




Tr(x,e\y) oc Tr(e)TT(x\e)Y\^(yi\xi,9) 



1=1 



oc7r(0)|Q(0)r/ 2 exp - -x 



T 



Q{0)x + 5^1og{7r(2/ i |x i ,0)} 



i=l 
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The approximated posterior marginals of interest Tr(xi\y), i = l,..,n and n(9j\y), j 
returned by INLA has the following form 



7t( Xi \y) = Y / H^\o (k \y)H0 {k) \y) A0« 

k 

a(o s \v) = J Tt(e\y)dOj 



l,...,m 

(3) 
(4) 



where {n(0^\y)} are the density values computed during a grid exploration on Tr(0\y). Since 
we don't have 7r(0\y) evaluated at all points required to compute the integral in Eq. Q we 
construct an interpolation l(0\y) using the density values {n{0^\y)} computed during the grid 
exploration on Tr(0\y) and approximate (jU) by 



7T 



I(B\y)dB. 



(5) 



Looking at [ (J3j) — dSJ) ] we can see that the method can be divided into three main tasks, (1) 
propose an approximation n(0\y) to the joint posterior of the hyperparameters ir(0\y), (2) 
propose an approximation jr(xi\0, y) to the marginals of the conditional distribution of the 
latent field given the data and the hyperparameters ir(xi\0,y) and (3) explore fr(0\y) on a grid 
and use it to integrate out in Eq. Q and 0_j in Eq. ([5]). 

The approximation used for the joint posterior of the hyperparameters ir(0\y) is 



n(0\y) oc 



n(x,0,y) 



TT G (x\e,y) 



x=x*(6) 



(6) 



where TTG(x\0,y) is the Gaussian approximation (see Section [2.2[) to the full conditional of x, 
an d x*(0) is the mode o f the full conditional for x, for a given 6. Expression ([6]) is equivalent 



to 



Tiernev and Kadand (|1986l ) Laplace approximation of a marginal posterior distribution, and 



it is exact if 7r(x\y, 6) is a Gaussian. 

For 7r(xi\0, y), three options are available, and they vary in terms of speed and accuracy. The 
fastest option, 7VQ(xi\0,y), is to use the marginals of the Gaussian approximation 7TG(x\6,y) 
already computed when evaluating expression Q. The only extra cost to obtain -Kc(xi\0, y) is 
to compute the marginal variances from the sparse precision matrix of ttg(x\6, y). The Gaussian 
approximation often gives re asonable results, but the re can be errors in the location and/or errors 
due to the lack of skewness (|Rue and Martind . 120071 ) . The more accurate approach would be to 
do again a Laplace approximation, denoted by i^LA{xi\0,y), with a form similar to expression 



ifLA(xi\0,y) oc 



ir(x,0,y) 



Tr GG (xi\xi,6 : y) 



X—i — X — {&% )@) 



(7) 
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where iTGG( x i\ x ij 6, u) is the Gaussian approximation to X{\xi, 9, y and tc_j * (xj, 0) is the modal 
configuration. A third option nsLA(%i\Q, y), called simplified Laplace approximation, is obtained 
by doing a Taylor expansion on the numerator and denominator of expression ([7]) up to third 
order, thus correcting the Gaussian approximatio n for locati o n and skewness with a much lower 
cost when compared to i^LA{xi\G,y). We refer to lRue et al.l (|2009l ) for a detailed description of 



the Gaussian, Laplace and simplified Laplace approximations to n(xi\0,y) 



2.4 INLA and the importance of the Gaussian field 

The main challenge in applying INLA to latent models is that the method depends heavily on 
the latent Gaussian prior assumption to work properly, both from the computational point of 
view and from the choice of approximations used as described in Section 12.31 

For the full conditional of x to be well approximated by a Gaussian distribution in equations 
(0) and (|7j). we need it to be well behaved and close to a Gaussian. This is basically ensured by 
the latent Gaussian prior that is assigned to x (see Stage 2 of Section 12. ip in latent Gaussian 
models, which has a non-negligible effect on the posterior, especially in terms of dependence 
between the components of x. 

Another important issue is that the conditional independence properties often encountered 
in the latent field translates into a sparse precision matrix when it is Gaussian distributed. This 
imply a huge decrease in computational time when performing the Gaussian approximation, 
which is extremely important since a Gaussian approximation needs to be computed for each 
value 0^ used on the grid for the numerical integration in Eq. @. 



3 Extension to near-Gaussian latent models 

In this section we show how to extend the scope of INLA to include models similar in structure 
to the latent Gaussian models described in Section \2. II but where the prior for some components 
of the latent field can have a near-Gaussian distribution. Section 13.11 will define these latent 
models in general while Section 13.21 will present an specific example, namely a survival model 
with Gamma frailty. 

3.1 Near-Gaussian latent models 

The models we are interested in this paper has the same structure as the latent Gaussian 
models described in Section 12.11 with the exception that the latent field has some independent 
non-Gaussian components. We redefine stage 2 of the hierarchical model of Section f2.il as 
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Stage 2 new . (x G , x NG ) \G 2 ~ it{x\0 2 ) = M(x G ; 0, Q 



{e 2 ))xY[ i TT{x NGi \e 2 ), 



x 



where x G and x^c represent the Gaussian and non-Gaussian terms of the latent field, respec- 
tively. In addition we assume that x^ G is formed by independent random variables. As a result, 
the distribution of the latent field is not Gaussian anymore, which precludes the use of INLA to 
fit this class of models. 

The term near-Gaussian latent models refer to the restrictions we impose on the non- 
Gaussian components of the latent field. We aim for non-Gaussian distributions that are not 
to different from a Gaussian one, and are so that they could be well enough approximated by 
a Gaussian density (at least locally). Unfortunately, it is not very useful to make a precise 
definition of this property, as what matters in the end is how our new approximation scheme 
performs in practice. We hope that it all becomes more clear and intuitive through out the rest 
of Section [3l 

A first approach to fit these models with INLA would be to include the non-Gaussian compo- 
nents xn G in the hyperparameters 9 of the model. However, this is not a good idea in practice 
since the size of x^ G is usually large and typically increase as the number of data points. This 
naive approach would lead to accurate results but the cost would be a large increase in compu- 
tational time due to the grid exploration necessary to compute Eqs. ([3])- ([5]). Our approach will 
deliver accurate results without the burden in computational time. 

3.2 Survival model - A first example 

Consider the following exponential model that can be used to analyze survival data that comes 
from subjects of the same group who are related to each other or from multiple recurrence times 
of a event for the same individual. The likelihood 



is given by independent exponential distributions given A = {Xij, i = 1,...,J, j = 1,..., J}, 
where Xij = 1//% and /i™ is the mean of tij. The index / could be interpreted as the number 
of groups in the data, while J would be the number of individuals in each group. This is a case 
of balanced data-set, but the unbalanced case could be treated just as easy by our method. 

It is expected that individuals belonging to the same group are correlated with each other, 
this can be included in the model through the addition of random effects w = {w\, ...,wi} to 
account for variation between groups, 




(8) 




(9) 
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Besides the random effects, it is common to include some covariate effects that in our case 
are represented by the fixed effects /3q and In the survival analysis literature, the random 
effects are often called frailty and it is common to assume that they are Gamma distributed, 
Wi ~ Gamma(K, k), with E(wi) = 1 to avoid identifiability issues. Gaussian priors are assumed 
for the fixed effects and a Gamma prior is often used for the random-effect hyperparameter n. 

The latent field for this model is given by x = (rj, (3, 6), with b = log(«j) and it is non 
Gaussian since b is formed by independent log-Gamma random variables, 

1 1 K 

tt(6|«) = n 7r ( 6 il«) = II ffr ex P{«( ft i - exp(6,))}. (10) 

i=\ i=l ^ 

Such a model cannot be applied straightforwardly using INLA since the assumption in Stage 2 
is violated. However, it fits the class of models in Section 13.11 and will be further analyzed in 
Section f4.il with our approach that we now describe. 

3.3 The extension 

It was described in Section 12.41 some points that explain the importance of the latent Gaussian 
prior for INLA to run smoothly. With that in mind we propose to approximate the prior of the 
non-Gaussian components t^(x^ g \0 2 ) by a Gaussian distribution tt g (xn G \6 2 ) and correct this 
approximation by the correction term 

CT = TT(x NG \e 2 )/TT G (x NG \e 2 ) (11) 

in the likelihood. This is, in fact, a way of writing a latent model of the form described in 
Section f3.il into a latent Gaussian model, defined in Section f2. II 

The first stage is now formed by the original likelihood multiplied by the correction term 

~]_Tr(y i \x i ,0 1 ) x Tr(x NG \0 2 )/Tr G (x NG \0 2 ). 

i=l 

Another way of writing this is to define an extended response vector z, with Zi = yi if i < iij 
and zi = if < i < n<i + k, where k is the length of x^ G and write 

Stage 1. z\x, 6 ~ k(z\x,G) = ]J^ k Tr(zi\xi, 6), where 

/ i «\ / v(Vi\x»Oi) forl<i<n d 

Tr{zi\Xi, 6) = < (12) 

[ ir(x NG) i\62)/TT G (x NG) i\0 2 ) for n d < i < n d + k 
It is important to emphasize that Stage 1 above is not the likelihood function, but expressing 
the model using this form will make the description and implementation of the algorithm that 
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follows easier. The latent field has now a Gaussian approximation replacing the non-Gaussian 
distribution of xjy G , 

Stage 2. [x G ,x NG \ \G 2 ~ ir(x\6 2 ) = Af(x G ; 0, Q~ l {0 2 )) x vr G (;EAr G |02), 

which means that now tt(x\6 2 ) is Gaussian distributed. The third stage is once again formed 
by the prior distribution on the hyperparameters, 

Stage 3. 6 ~ tt(0). 

Independent of the Gaussian approximation 7r G (:cAr G |#2) used, the hierarchical model above 
is equivalent to the model described in Section 1331 Considerations on how to chose this Gaussian 
approximation and how this model formulation will help us to perform inference will be presented 
soon, but first lets rewrite the survival model of Section 13.21 in this new formulation. 

Survival model (Cont.) 

For the survival model defined in Section 13. 2\ we have that the original likelihood function, 
defined in Eq. (JSJ), is an exponential distribution 

log7r(*ij|77i_j-) = rjij - expfajUj), 

where rftj is the linear predictor defined in Eq. ([9]) . Based on Eq. (|10j) the correction term (see 
Eq. pip) is given by 

n d +I 

ct= Yl 7T(x NGti \e 2 )/7r G (x NG)l \e 2 ) 

i=n d + l 
I 

= Y\_n(bi\n)/Tr G (bi\ti b ,T b ) = CT h 
i=i 

with 

log CTi = K{bi - exp(fe;)) + ^^{h - fi b (K)) 2 + const, (13) 

where ^(k) and Th(n) are the mean and precision parameter of the Gaussian approximation 
to the log-Gamma random effects b and const is a constant that does not depend on b. The 
latent field x = (77, b, (3) is now Gaussian since 7t(&|k) is approximated by 7r G (b\fi b (n), t&(«)) = 
Af(b; /ife(«;)l/, t^k) -1 //), where l n is a vector of ones with dimension n and I n is an n x n 
identity matrix. 

□ 
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We now have our latent (non-Gaussian) model of interest written as a latent Gaussian model 
and can adapt INLA to perform inference on such models. The main change is on the Gaussian 
approximation to the full conditional of the latent field (see Section 12 .2|) , that now takes the 
form 

, n d n d +k . 

ir(x\G,y) oc exp I - -x 1 Q(6)x + + ^2 h ^ x i) \ > ( 14 ) 

i=l n d +l ' 

where gi(xi) = log Tr(yi\xi, 9) as before and 

hi(xi) = logCTi = log7r(x NGj i\e 2 ) - logir G (x NG ,i\6 2 ). 

It was shown in Section 12.21 that a Gaussian approximation is obtained by approximating the 
non-quadratic functions, in this case gi(xi) and hi(xi), by quadratic functions using Taylor 
expansion up to second order. Once we know we are dealing with a well behaved log likelihood 
function gi{xi) as, for example, those belonging to the exponential family, the success of a 
Gaussian approximation to Eq. (|14p will depend heavily on the shape of hi(xi). For instance, 
it is desirable to have a bounded correction term 

t^Ng{-\0)/^g{-\0) < oo 

for a quadratic form approximation to hi(xi) to make sense. This imply that the Gaussian 
approximation ttq{xnc\6) should ideally dominate ttng( x ng\9) m the sense that it should have 
thicker tails than the non-Gaussian distribution it is trying to approximate. But in practice, 
it is sufficient to have a bounded correction term on the region that concentrates the bulk of 
probability mass since we can afford a bigger approximation error on the region that doesn't 
contribute much to the density (|14|) . 

Although not necessary, it is also desirable to have a log-concave correction term, at least 
on a neighborhood of the mode of Eq. ()14p . since it gives more stability to the optimization 
step required to find the mode of nG(x\8,y), which is necessary to compute Eq. ©. In our 
examples, we have chosen ttg(-\0) to be a Gaussian distribution with zero mean and low precision 
to approximate the distribution of the non-Gaussian components it]s[g(-\0)- 

Survival model (Cont.) 

For the survival model, we have a log-concave likelihood function as we can see in Figure [TJ 
where we have the plots of the log likelihood and the second derivative of the log likelihood for 
a given data point, assuming different values for the data point. 
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(a) 



(b) 



Figure 1: Plot of the log likelihood (a) and of the second derivative of the log likelihood (b) 
against the linear predictor for different values of data points for the survival model. It was used 
the following values for the data point: 0.5 (Dashed), 1 (Solid), 2 (Dotted) and 3 (Dot-dash) 

If we use a zero mean and low precision Gaussian distribution = and tj — > 0) in Eq. 
(|13p we also attain a log-concave correction term, 

log CTi = K,(bi — exp(hj)) + const 

illustrated in Figure [2l 




-4 -2 2 4 -4 -2 2 4 

rode rode 



(a) (b) 

Figure 2: Plot of the log correction term (a) and of the second derivative of the log correction 
term (b) against hi for different values of k for the survival model. It was used the following 
values for k: 0.5 (Dashed), 1 (Solid), 2 (Dotted) and 3 (Dot-dash) 

□ 
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Our approach approximates the latent field by a latent Gaussian field to take advantage of 
the benefits described in Section 12.41 at the cost of having a more complicated "likelihood" as 
displayed in Eq. (|12p . which will basically affect the Gaussian approximation in Eqs. © and 
([7]) . Performance considerations of this approach will be given along the examples of Section [J] 
and in the conclusion of the paper in Section [5j 

We now proceed to two examples where we apply the methodology proposed in this paper 
and compare the results with that obtained by MCMC. 



4 Examples 

4.1 Survival analysis with Gamma frailty 

Here we apply our proposed extension to fit the model defined in Section 13.21 in a simulated 
data-set and compare the results with that obtained by MCMC. For the experiment reported 
we have simulated 100 groups, each of which with 10 individuals. The covariates {xij} in Eq. 
d9j were simulated from a uniform distribution on the interval (0, 1) while the frailties came 
from a Gamma distribution with both the shape and rate parameters equal to 1. The fixed 
effects Po and f}\ were c hosen to be 1. 



We use OpenBUGS (jLunn et all 120091 ) to generate samples from the posterior distribution. 
Figure 3(a) show the posterior mean of the log frailties {6j : bi = log Wi, i = 1, 100} obtained 
by INLA (x-axis) and by MCMC (y-axis). An identity function is also plotted in order to help 
visualize the strong agreement between both methods. Figure [3(b) | display the histogram formed 
by the samples of 7r(6go|j/) returned by OpenBUGS and the line is the approximated posterior 
computed using our extension. This specific component was chosen at random, since similar 
accuracy was obtained for all log frailties in our simulation study. Figured show similar pictures 
for f3\ and k to show that the excellent results are also valid for the fixed effects and for the 
hyperparameter k of our model. The R code used to run INLA in this example is available in 
Appendix [Al 

It is important to note that the comparisons with MCMC were made with millions of samples, 
taking minutes to run, since for short to medium number of samples it was possible to visually 
detect errors in the MCMC estimates when compared to INLA, that took a little bit more than 
1 second to run on a Intel Core i5 with 2.67GHz. One can argue that the number of samples 
(and time) necessary by a MCMC scheme to attain the desired accuracy of our application 
could be reduced if more time was spent designing a specific MCMC scheme for this particular 
application instead of using the general purpose OpenBUGS. Although this is true in theory, we 
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(a) (b) 

Figure 3: Comparison between INLA and MCMC for the exponential gamma frailty example: 

(a) Plot of the posterior mean of the log frailties returned by INLA (x-axis) vs. MCMC (y-axis). 

(b) Approximate posterior density for logi^so obtained by INLA (solid line) and by MCMC 
(histogram). 



kappa betal 

(a) (b) 

Figure 4: Comparison between INLA and MCMC for the exponential gamma frailty example: 
(a) Approximate posterior density for k obtained by INLA (solid line) and by MCMC (his- 
togram) (b) Approximate posterior density for j3\ obtained by INLA (solid line) and by MCMC 
(histogram) 



are here comparing two general purpose tools for the class of latent models into consideration, 
and even with a tailored MCMC scheme, we believe that the difference in time will still be in 
orders of magnitude, not to mention the time necessary to develop specific algorithms for each 
new model belonging to this same class. 
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4.2 Robust mixed- effects models using Student-t distribution 

In this example, we show our method applied to a Bayesian random effects model where the 
random effects has an Student-t distribution. Assume we have data {y^ i = 1, ...,n} recorded 
for n groups each having ki individuals. Lets assume that y^s are independent Gaussian random 



vector described by the following standard mixed effect model (jLaird and Ward . Il982l ). useful 
to analyze repeated measures or grouped data, 

Vi = X l (3 + b i + e i , (15) 

where both the random effect bi and the error term ej have a Gaussian distribution, 6j ~ N(0, er^) 
and ej ~ N(0, o\T) with variances a\ and a\ respectively, being I a (ki x ki) identity matrix. 
Xi represent the (ki x p) design matrix for group i and f3 is a (p X 1) vector of fixed effects. 

Statistical inference based on the Gaussian distribution is known to be vulnerable to outliers. 
One approach to more robust modeling is to replace the Gaussian dist ribution by Student-t 
distribution in the model. In the context of linear mixed effects model, 
suggested to follow the robust statistical modeling approach described by 



Pinheiro et a 



Lange et al 



tail) 



19891 ) in 



which the Gaussian distributions of bi and ej are replaced by i-distributions, 

&i~i(0,^ 6 V), Gi ~t(0,ij 2 e I,v), (16) 

where and ^\ are the scale parameters and v is the common degree of freedom parameter. 
They also noted that in mixed effects models the outlier may occur either at the level of within- 
group error ej, called e-outliers, or at the level of random effects bi, called 6-outliers. This 
approach can be regarded as outlier accommodation although it provides useful information for 
outlier identification. For the simulation experiment performed later in this Section, we have 
used a Gamma prior with shape and rate parameters given by 1 and 0.1 respectively for the 
inverse scale parameters, a Gaussian distribution with mean zero and low precision (10~ 4 ) for 
the fixed effects and a Gaussian distribution with mean 3 and variance 1 for v* = \og(v — 5), so 
that the bulk of prior probability mass is between 7 and 150 for the degree of freedom parameter 
v. Note that we have defined v* so that v > 5 in order to get a well defined first four moments 
of the Student-t distribution. 

The model (|15p -(|16 p has the likelihood function formed by a t distribution 



Vij\x,0 ~t(rjijiij)l,v), i = 1, ...,n and j = 1, ...,ki, 
which does not belong to the exponential family, where rjij is the linear predictor 

r]ij = Xi(3 + h. 
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The latent field is then formed by x = (rj,b,(3), where b = {bf, i = l,...,n} is formed by 
independent t distributed random variables and has therefore a non-Gaussian distribution given 
by 

r(**i) 



t=l { 2' 



i=l 



1 + 



(17) 



If we use Eq. (jlip and (|17p we get the following log correction term 
log CI; = log 7r(6i|-0^, i/) - log TT G (bi \ fj, b ,r b ) 



+ 



log U + 



+ - W>) 2 + const. 



Again, if we assume a zero mean and low precision Gaussian distribution = and t& — > 0) 
we end up with 



log CI- 



(" + 1) 



log l + -f- 



+ const. 



Figure [5] show plots of the second derivative of the likelihood (Figure 5(a)) and of the log 



correction (Figure 5(b)) term assuming a data point y = 1, variances tp^, = I, tp^ = I and 
different values for v (y = 5, 10, 20, 50). 



(a) (b) 

Figure 5: (a) Plot of the second derivative of the log likelihood against linear predictor for ip^ 
= 1, y = 1 and different values of v and (b) plot of the second derivative of the log correction 
term against b{ for tp^ = 1 and different values of v for the t- mixed effect model. It was used 
the following values for v. 5 (Solid), 10 (Dashed), 20 (Dotted) and 50 (Dot-dash). 

We have here an example where both the likelihood and the correction term are not log- 
concave, specially for low values of degree of freedom parameter v. We can still apply our 
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extension to such models but one has to be careful when designing the optimization algorithm 
to optimize Eq. (|14p on such cases. 

Now we can proceed to a contamination study similar to that performed in 



Pinheiro et al 



(|200ll ). Here we have data simulated from 



Vi = Xfi + bi + ei, z = 1, ...,27, X 



1 8 

1 10 

1 12 

1 14 



(18) 



with the following mixture of Gaussian models being used to contaminate the distributions of 
the bi and the e,. 

iH (1 - Pb )M(0, a 2 ) + Pb fM(0, a 2 b ) 

(l-p e )Af(0,a 2 e )+p e fM(0,a 2 ), z = 1, -.,27, j = l,...,4 

where p b and p e denote, respectively, the expected percentage of b- and e-outliers in the data and 
/ denotes the contamination factor. The true parameters for the uncontaminated distributions 
are a 2 = 3 and a 2 = 2, while the true values for the fixed effects are ft = (12, 1) T . 

All 32 combinations of Pb,p e = 0, .05, .1, .25, and / = 2, 4 were used in the simulation study. 
The / = 2 case corresponds to a close contamination pattern, while / = 4 illustrates a more 
distant contamination pattern. A total of 500 Monte Carlo replications were obtained for each 
(j>b, Pe, f) combination. 

Let 6 denote a parameter of interest, with target value 9q ^ 0, estimated by 6, which in our 
case will be the posterior mean of 6. The efficiency of the Gaussian estimator Qq relative to the 
multivariate t estimator 9t is defined as the ratio of the respective mean square errors, 

E(e G - e ) 2 /E(e T - e ) 2 , (19) 

where expectations are taken with respect to the simulation distribution, that is E{6 — 8q) 2 = 
£-=i(^- 0o) 2 /5OO. 

We have chosen some data-sets out of the 32 x 500 = 16000 used in this contamination study 
and fitted the model using both MCMC and INLA to make sure that INLA is doing at least as 
good as MCMC in terms of accuracy. After that we proceed with the contamination study with 
INLA as the only estimation method as it would be impractical to fit all 16000 data-sets with 



MCMC. Figures [6] and [7] illustrate this comparison for one of the data-sets, where Figure 6(a) 



display the log random effects returned by INLA (x-axis) vs. MCMC (y-axis), while Figures 
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|6(b) | and [7] display the approximate posterior densities for log 77, = log and for the log fixed 
effects log/3o and log/3i respectively, obtained by INLA (solid line) and by MCMC (histogram). 




-3-10 1 2 

INLA 



(a) Plot of the posterior mean of the log random 
effects returned by INLA (x-axis) vs. MCMC (y- 
axis) . 

Figure 6: Comparison between INLA and 





-2-10 1 2 3 



(b) Approximate posterior density for log it, = 
log I /ipt obtained by INLA (solid line) and by 
MCMC (histogram) 

MCMC for the robust mixed effect model. 




(a) Approximate posterior density for log/3o ob- (b) Approximate posterior density for log/3i ob- 
tained by INLA (solid line) and by MCMC (his- tained by INLA (solid line) and by MCMC (his- 
togram) togram) 

Figure 7: Comparison between INLA and MCMC for the robust mixed effect model. 

Figures [8] and [9] plots the relative efficiencies, defined in Eq. (fl~9j) , between the posterior 
means of the t model over the Gaussian linear mixed effects model. Based o n the plots the 
conclu sion of our simulation study are, as expected, similar to that obtained by 



Pinheiro et al 



(|200ir ). There are substantial gains in efficiency for all parameters under the more distant 



contamination pattern (/ = 4) and moderate gains under the close contamination pattern 
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(/ = 2). The efficiency gains are bigger for the precision of the random effects and the non- 
monotonic behavior of the efficiency gains suggest that the t model is more robust than the 
Gaussian model especially for moderate percentage (5 — 10%) of outliers. The two methods 
have about the same efficiency under the no-contamination case. 




(a) fio under close contamination pattern (/ = 2) (b) Po under distant contamination pattern (/ = 4) 




5 10 15 20 25 5 10 IE 20 25 



(c) pi under close contamination pattern (/ = 2) (d) Pi under distant contamination pattern (/ = 4) 

Figure 8: Relative efficiencies (see Eq. (TT9j) ) of the t posterior mean with respect to the Gaussian 
posterior mean for the fixed effects in the linear mixed effect example. It plots the efficiency on 
the y-axis and p e on the x-axis. The meaning for the different types of lines are: (Solid line) 
Pb = 0%, (Dashed line) pb = 5%, (Dotted line) pb = 10% and (Dot-sash line) pb = 25%. 

5 Conclusion 

This paper extends the INLA method to latent models outside the scope of latent Gaussian 
models, where independent components of the latent field can have a near-Gaussian distribution. 
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(a) r e = l/(Je under close contamination pattern (b) r e = l/crf under distant contamination pattern 
(/ = 2) (/ = 4) 




(c) Tb = 1 /o"b under close contamination pattern (d) n, = 1 /a% under distant contamination pattern 
(/ = 2) (/ = 4) 

Figure 9: Relative efficiencies (see Eq. (|19p ) of the t posterior mean with respect to the Gaussian 
posterior mean for the precision parameters in the linear mixed effect example. It plots the 
efficiency on the y-axis and p e on the x-axis. The meaning for the different types of lines are: 
(Solid line) pb = 0%, (Dashed line) pb = 5%, (Dotted line) pb = 10% and (Dot-sash line) 
p b = 25%. 



It provides two examples of interest and compare the results obtained by our method with that 
returned by MCMC, showing similar accuracy with only a small fraction of computational time. 
We strongly believe that this extension will have a significant impact for the applied community, 
allowing them to check, for example, normality assumption that are usually taken for granted 
due to the lack of user friendly tools to fit more complex models accurately and in a reasonable 
amount of time. 
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Further work is required to evaluate which class of distributions can be used in the framework 
described in this paper with the same level of accuracy as the one displayed in our examples 
for the log-gamma and Student t distribution. It was our experience that the method gives 
very accurate results when the non-Gaussian components have distributions not so far from the 
Gaussian, hence the name near-Gaussian distributions. Distributions that have properties far 
from a Gaussian, such as those not defined on the real line (e.g. the exponential distribution) 
are not expected to be well approximated by the method described here. 

As mentioned in Section [3j it was not our interest to give a precise and restrictive definition 
of near-Gaussian distributions, and there are mainly two reasons for that. First, it is not easy, 
if at all possible, to come up with a formal definition where our proposed extension would be 
guaranteed to work for every family of distribution that is contained in such definition, specially 
because the INLA approach is know to fail in some cases, even withou t allowing for near-Gaussian 



components; see the response from the authors in lRue et al.l (|2009l ) . Second, our main interest 
is to see how our approximation scheme performs in practice, and for that purpose we believe 
that a possibly restrict definition here would do more harm than good. 

Another set of useful models that can be addressed by the approach described here and 
deserves further investigation are the class of dynamic models with non-Gaussian error term for 



the observation and/or system equation, as in lKitagawal (j 19871 ) for example. An ongoing project 



is to use the methodology described in this paper to develop a systematic approach to model 
selection that take advantage of the speed of our method to test standard assumptions of a given 
model. For example, this could be used by applied researchers that routinely use random effects 
models to verify the standard assumption of Gaussian distributed random effects. 

A R code - Survival model 



Following is the INLA code used in example 14.11 Please visit http://www.r-inla.org/ for 
more information about the R package INLA. 



# Simulate a dataset 
# 

n = 100 # number of groups 

m = 10 # number of individuals in the same group 

z = runif(n*m) # simulate covariate 

eta = 1 + z # linear predictor 

frailty = rgamma(n, 1, 1) # simulate frailties 
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y = rexp(n*m, rate = rep(frailty, each = m) * exp(eta)) # simulate data 

# INLA code 
# 

## Construct an extended response vector Y 
yy = inla. surv(c(y, rep(NA, n)), 

c(rep(l, n*m) , rep(NA, n) ) ) # Observation component 
ff = c(rep(NA, n*m) , rep(l, n)) # Frailty component, 

# any observation will do, like '1' 
Y = list(yy, ff) # extended response vector 
## Construct extended covariates and frailties 
intercept = c(rep(l, n*m) , rep(NA, n) ) # intercept 

zz = c(z, rep(NA, n) ) # covariate 

loggamma. frailty = c(rep(l:n, each=m) , l:n) # frailty 
## Model formula 

formula = Y " -1 + intercept + zz + 

f (loggamma. frailty, model="iid", 

hyper = list(prec = list(initial=-5, f ixed=TRUE) ) ) 
## prior for the frailty 

hyper . frailty = list(prec = list(param=c(l, 1))) 
## Run inla function 
rr = inla(f ormula, 

data = list(Y=Y, zz=zz, intercept=intercept , 

loggamma. frailty= loggamma. frailty) , 

family = c ( "exponential" , "loggammaf railty") , 

control. data = list(list(), list(hyper = hyper . frailty) ) , 

control . fixed = list(prec = list(default = 0.01)), 

control. inla = list (strategy = "laplace") 

) 
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